Effective Data Visualisation with R
Introduction

Paul Murrell
The University of Auckland

Goals

  • What to draw:
    Provide a framework for reasoning about what makes an effective data visualisation.

  • How to draw:
    Learn to make effective data visualisations with {ggplot2}.

What to Draw

  • We want a framework so that we can make deliberate and rational decisions when designing a data visualisation.

  • When we create something bad, we want to know what we did wrong so that we can make it better (and not do the bad thing ever again).

  • When we create something good, we want to know what we did right so that we can do it again.

What to Draw

Assumptions & Limitations

  • This course focuses only on static graphics.

  • This course focuses on graphics for presentation; we have a message to convey.

  • We will only consider producing a data visualisation by writing code.

  • You should be familiar with R
    (or be able to make friends quickly)

Course Structure

  • 6 sessions
  • 2 parts per session (30 + 15)

Day 1

  1. Introduction and
    The visual system.

  2. {ggplot2} and
    Quantitative data.

  3. Qualitative data and
    Accuracy.

Day 2

  1. Combining features and
    Multiple features

  2. Labels and
    Graphic design

  3. Customisation and
    Review

Introduction to Data Visualisation

  • What is data visualisation?

  • Terminology

  • Making use of the visual system

  • Mapping data to data symbols

What is Data Visualisation?

  • Data Visualisation is NOT Photography

An ultrasound of a beating heart

What is Data Visualisation?

  • Data Visualisation is NOT Scientific Visualisation

A 3D simulation of a beating heart

What is Data Visualisation?

  • Data Visualisation is NOT Art or Entertainment

A cute cartoon heart doing exercise

What is Data Visualisation?

  • A Data Visualisation is an artificial, abstract image
    that uses geometric shapes to represent data.

A line plot of the electrical activity of a human heart

Terminology

Terminology

  • What do a scatter plot, a bar plot, and a pie chart have in common?

Terminology

  • Data symbols are the geometric shapes that represent data values (e.g., points and lines).

Terminology

  • Guides are elements of a data visualisation that explain how data values are represented (e.g., axes and legends).

  • Guides are themselves mini data visualisations.

Terminology

  • Labels provide context and background information about a data visualisation (e.g., titles and captions).

  • Not all text is a label.

A data visualisation is made up of data symbols and labels.

  • We will spend most of our time on the task of selecting data symbols for a data visualisation.

  • We will also spend some time towards the end on the labelling of data visualisations.

Making Use of the Visual System

Example Data: NZ Youth Crime

  • The data frame rates contains the rate of youth crime (the number of youth offenders per 10,000 of population), for each age (from 10 to 13) in each year (from 2010 to 2020).
head(rates)
  Age Year Rate
1  10 2010   66
2  11 2010  120
3  12 2010  211
4  13 2010  392
6  10 2011   62
7  11 2011  107

Making Use of the Visual System

  • We use data visualisation to help answer questions.

  • Is the crime rate increasing or decreasing over time for each age group?

Age 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
10 66 62 65 43 41 39 29 33 22 24 16
11 120 107 107 77 82 60 62 58 46 50 38
12 211 189 166 154 133 115 116 115 97 88 76
13 392 356 293 263 230 216 195 203 176 158 144
Total 197 179 160 137 122 107 100 101 83 79 69

Making Use of the Visual System

  • The visual system is good at answering some questions.

  • Is the crime rate increasing or decreasing over time for each age group?

A line plot of youth crime rates in New Zealand over time

Making Use of the Visual System

  • We use data visualisation to help answer questions.

  • Was the total crime rate higher in 2016 or 2017?

Age 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
10 66 62 65 43 41 39 29 33 22 24 16
11 120 107 107 77 82 60 62 58 46 50 38
12 211 189 166 154 133 115 116 115 97 88 76
13 392 356 293 263 230 216 195 203 176 158 144
Total 197 179 160 137 122 107 100 101 83 79 69

Making Use of the Visual System

  • Our visual system is less good at answering some questions.

  • Was the total crime rate higher in 2016 or 2017?

A line plot of youth crime rates in New Zealand over time

Making Use of the Visual System

  • Our visual system is very good at answering some questions.

  • Was the total crime rate higher in 2016 or 2017?

An effective data visualisation takes advantage of the strengths of the visual system.

  • We will spend a lot of our time exploring the strengths and weaknesses of the visual system.

  • Asking the visual system to perform some calculations is asking too much.

  • We need to choose a data vis based on the question, not just on the data.

Mapping data to data symbols

Mapping Data to Data Symbols

  • data symbols represent data values.

  • Our goal is to learn how to choose an effective mapping from data values to data symbols.

Mapping Data to Data Symbols

    Symbol:     points (circles) and lines
    Mappings:   year  ->  x-location
                rate  ->  y-location
                age   ->  colour

A line plot of youth crime rates in New Zealand over time

Mapping Data to Data Symbols

    Symbol:     bars (rectangles)
    Mappings:   year  ->  x-location
                rate  ->  length
                age   ->  colour (and x-location)

A bar plot of youth crime rates in New Zealand over time

Mapping Data to Data Symbols

    Symbol:     tiles (rectangles)
    Mappings:   year  ->  x-location
                rate  ->  colour
                age   ->  y-location

A heat map of youth crime rates in New Zealand over time

An effective data visualisation makes us of a good mapping from data to data symbols

  • We will spend a lot of time talking about how to choose a good mapping.

Summary

Summary

  • A data visualisation consists of data symbols, guides, and labels.

  • A data visualisation can help to answer questions.

    • An effective data visualisation will pose questions that the visual system is good at answering.
  • We need to choose a mapping from data values to data symbols.

    • An effective data visualisation will have good mappings from data to data symbols.

Exercises

Exercises

  • Can you identify the data symbols, guides, and labels in this image?

Source: 2023 Youth Justice Indicators Summary Report

Exercises

  • Can you identify the mappings (of data values to data symbols) that are used in this image?

  • The data visualisation below is from a graduate student project.

  • The task of interest is: identify the month(s) in which the prediction is higher than the actual (and vice versa).

  • Can you identify ways in which the visual system finds it either easy or difficult to answer this question?